Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lark sheets connector #17218

Merged
merged 1 commit into from
Mar 9, 2022
Merged

Add lark sheets connector #17218

merged 1 commit into from
Mar 9, 2022

Conversation

7c00
Copy link
Member

@7c00 7c00 commented Jan 24, 2022

This pull introduces Lark Sheets Connector, which is inspired by Google Sheets Connector. Though there are tiny differences between them.

Developers could run LarkSheetsQueryRunners#main, and then follow the QueryModel section from the doc to try the connector.

Test plan

  • Integation test
  • Unit test
== RELEASE NOTES ==

Connectors
* Introduce Lark sheets connector.

@7c00 7c00 force-pushed the presto-lark branch 4 times, most recently from c7198fd to 65ac8f6 Compare January 26, 2022 10:04
@7c00 7c00 marked this pull request as ready for review January 26, 2022 10:17
@7c00 7c00 changed the title WIP: Add lark sheets connector Add lark sheets connector Jan 26, 2022
@7c00 7c00 force-pushed the presto-lark branch 2 times, most recently from 020e181 to 6f96def Compare January 27, 2022 02:08
@7c00
Copy link
Member Author

7c00 commented Jan 27, 2022

Hello, @ChunxuTang @beinan. Could you help review this PR?

@v-jizhang v-jizhang self-requested a review January 27, 2022 15:07
@rohanpednekar
Copy link
Contributor

Thanks, @7c00 for your contribution, we were planning to request @ChunxuTang and @beinan as they helped us earlier to review the google sheet connector.

Copy link
Member

@ChunxuTang ChunxuTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @7c00, thanks a lot for your work! This looks to be a very nice feature, and the Presto community could benefit.
Just left some suggestions/comments. For the first round of review, I mainly focused on the coding styles. Thanks again for your contributions!

@@ -0,0 +1 @@
ewogICJhcHAtc2VjcmV0IiA6ICJYSHdrckVna0N6Y3dFZ0NDNlpQQjRnWXZtb25OSmdNTiIKfQ==
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a TTL for this secret? And is there an owner email account? There's a potential risk here that when the secret expires, it may break presto CI workflows.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no TTL for the secret. The secret is not bound to an email account.
I wonder if we should disable TestSheetsIntegration by default as TestGoogleSheets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we may need to disable it.
Hi @beinan, could you help disable this integration test by default?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's disable the test, otherwise it would be too flaky I guess

We could use @Test(enabled = false) in the integration test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabled all the tests in TestLarkSheetsIntegration.

@7c00 7c00 force-pushed the presto-lark branch 2 times, most recently from d1c4dd5 to 91d0063 Compare February 7, 2022 09:09
@7c00
Copy link
Member Author

7c00 commented Feb 7, 2022

@ChunxuTang Thanks a lot for your comments. I haved update the code. Could you please review it again?

Copy link
Member

@ChunxuTang ChunxuTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @7c00, thanks for your revision!
Left a few minor suggestions/comments.


connector.name=lark-sheets
app-domain=FEISHU
app-id=cli_a16fc7ddf4b9900d
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering is this a real app id?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a real app id.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering as this is a real ID, is it OK to expose it in an example file? Maybe just use exampleAppId?
Your call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to a fake app id as example_app_id.

presto-docs/src/main/sphinx/connector/larksheets.rst Outdated Show resolved Hide resolved

The ``token`` property is taken from the last path segment of the spreadsheet url, e.g.
``shtcnBf5pg4BNSkwV2Ku5xwW9Pf`` for
``https://test-ch80md45anra.feishu.cn/sheets/shtcnBf5pg4BNSkwV2Ku5xwW9Pf?sheet=MT1p4I``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is a real spreadsheet. Will the link expire?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it will not expire.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it more readable, I might use a pattern string rather than a real link. e.g. https://{account_id}.feishu.ch/sheet/{token}/sheet={sheet_id}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the pattern string. And give an example as well.

Comment on lines 81 to 82
public class SimpleSheetsApi
implements SheetsApi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering as the CachingSheetsApi is implemented in a separate file, why do you prefer to implement the SimpleSheetsApi in the factory class? Any specific reasons?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible some of my perference during development I guess. Have rafactored SimpleSheetsApi to a top level class to make it matched the code style of CachingSheetsApi.

@@ -0,0 +1,70 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor suggestion: As you added some utility functions to SheetUtil, would you mind also testing them here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Added tests for new functions here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you add them? Forgot to commit changes?
It doesn't show any changes here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests was added to another class by mistake. Have corrected this and moved the test to TestLarkSheetsUtil.

@7c00
Copy link
Member Author

7c00 commented Feb 16, 2022

@ChunxuTang Thanks for your comments. ❤️
Have updated the code. Could you please take a review again? Thanks in advance.

@ChunxuTang
Copy link
Member

@ChunxuTang Thanks for your comments. ❤️ Have updated the code. Could you please take a review again? Thanks in advance.

Sure! I'll take another round of review very soon this week.

Copy link
Member

@ChunxuTang ChunxuTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@7c00 Thanks for your contributions!

As you created this PR some time ago and it is now left a bit behind the master branch, could you merge master into this feature branch to avoid conflicts?

@7c00 7c00 force-pushed the presto-lark branch 3 times, most recently from 9866ddf to b8cd8f8 Compare February 21, 2022 03:25

The ``token`` property is taken from the last path segment of the spreadsheet url, e.g.
``shtcnBf5pg4BNSkwV2Ku5xwW9Pf`` for
``https://test-ch80md45anra.feishu.cn/sheets/shtcnBf5pg4BNSkwV2Ku5xwW9Pf?sheet=MT1p4I``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it more readable, I might use a pattern string rather than a real link. e.g. https://{account_id}.feishu.ch/sheet/{token}/sheet={sheet_id}


import static java.util.Objects.requireNonNull;

public class SheetsConnector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more consistent, can we use LarkSheetsConnector as the class name? Similarly, other class names should also use LarkSheets

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine. Renamed to LarkSheetsConnector and prefixed other class with Lark as well.

@@ -0,0 +1 @@
ewogICJhcHAtc2VjcmV0IiA6ICJYSHdrckVna0N6Y3dFZ0NDNlpQQjRnWXZtb25OSmdNTiIKfQ==
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's disable the test, otherwise it would be too flaky I guess

We could use @Test(enabled = false) in the integration test.

@7c00 7c00 force-pushed the presto-lark branch 3 times, most recently from fb67c06 to aefb67a Compare February 21, 2022 11:53
@7c00
Copy link
Member Author

7c00 commented Feb 21, 2022

@ChunxuTang @beinan Thanks for your comments. I have rebased my branch with master and updated the code as the comments. Could you please review it again? Thanks in advance.

@7c00 7c00 requested a review from beinan February 21, 2022 12:23
Copy link
Member

@beinan beinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just left a couple comments for some minor issues, I will continue review this giant pr in this week. Many thanks for this create contribution!

Comment on lines 70 to 71
LarkSheetsApi api = apiFactory.get();
LarkSheetsApi cachingApi = new CachingLarkSheetsApi(api);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inline api?

Suggested change
LarkSheetsApi api = apiFactory.get();
LarkSheetsApi cachingApi = new CachingLarkSheetsApi(api);
LarkSheetsApi cachingApi = new CachingLarkSheetsApi(apiFactory.get());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

SCHEMA_TOKEN_NOT_PROVIDED(0x03, USER_ERROR),
SCHEMA_NOT_EXISTS(0x04, USER_ERROR),
SHEET_NAME_AMBIGUOUS(0x04, USER_ERROR),
SHEETS_API_ERROR(0x10, EXTERNAL),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this plural? if the plural is necessary, it might be better to follow alphabetical order

Copy link
Member Author

@7c00 7c00 Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a typo. Have reorderred the error codes to three catagories.

Comment on lines 28 to 31
SCHEMA_ALREADY_EXISTS(0x03, USER_ERROR),
SCHEMA_TOKEN_NOT_PROVIDED(0x03, USER_ERROR),
SCHEMA_NOT_EXISTS(0x04, USER_ERROR),
SHEET_NAME_AMBIGUOUS(0x04, USER_ERROR),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why some of the error are sharing the same error code? the 0x03 and 0x04 are used two times here.

Copy link
Member Author

@7c00 7c00 Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a typo. Have reorderred the error codes to three catagories.

}
else if (matched.size() > 1) {
throw new PrestoException(LarkSheetsErrorCode.SHEET_NAME_AMBIGUOUS,
format("Ambiguous name %s in spreadsheet %s: matched sheets: %s", sheetName, token, matched));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we throw an exception with the token? will there be any security concern?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting the raw tokens in exception messages might cause data leak. Have masked the token before writing the token.

public List<SchemaTableName> listTables(ConnectorSession session, Optional<String> schemaName)
{
if (!schemaName.isPresent()) {
throw new PrestoException(NOT_PERMITTED, "Schema is required to list tables");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: any default schema for lark/feishu?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default schema is not supported, since we dont know which spreadsheet shoud be associated with the schema.

public void renameSchema(ConnectorSession session, String source, String target)
{
LarkSheetsSchema schema = requireVisibleSchema(session, source);
checkSchemaUpdatable(schema, session.getUser(), "rename");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to use constant or enum for the operations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operation parameter in checkSchemaUpdatable is used for generating an error message. So I think it might be not neccessary to define constants of enums for it.

LinkedHashMap<String, LarkSheetsColumnHandle> columns = new LinkedHashMap<>(numColumns);
for (int i = 0; i < numColumns; i++) {
String rawColumnName = header.get(i);
if (rawColumnName == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just thinking if we could use the column index when the column name is null. Your call.

Copy link
Member Author

@7c00 7c00 Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column name can be a sequence of any UTF-8 characters. So it might be hard to define the column index name spec. For example, using $2 as the column index to reference the second column, it might conflict with another column with the name of $2. In the other hand, updating a spreadsheet to fill a column name is very easy for end users. So column index name is not be supported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, thanks for the clarification!

@7c00
Copy link
Member Author

7c00 commented Feb 25, 2022

@ChunxuTang @beinan Hi, thanks for your review. ❤️ Do we need more review before merge this pr?

Copy link
Member

@beinan beinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good to me except two very minor issues.

this.api = requireNonNull(api, "api is null");
this.split = requireNonNull(split, "split is null");
this.readColumns = requireNonNull(readColumns, "columns is null");
this.indexMapping = createIndexMapping(readColumns);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary the ".this"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// Used for fast switch to enable or disable tests
private static final boolean TEST_ENABLED = false;
private static final String CATALOG = "larksheets";
// https://test-ch80md45anra.feishu.cn/sheets/shtcnBf5pg4BNSkwV2Ku5xwW9Pf?sheet=MT1p4I
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this comment for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This url points to the spreadsheet used for this integration test. The TESTING_TOKEN is extracted from the url. I have added comments to make it more clear.

@7c00 7c00 force-pushed the presto-lark branch 2 times, most recently from 08e82ab to ee0facb Compare February 28, 2022 06:32
@7c00 7c00 requested a review from beinan February 28, 2022 07:42
@7c00
Copy link
Member Author

7c00 commented Mar 3, 2022

ping @beinan @ChunxuTang

Copy link
Member

@ChunxuTang ChunxuTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @7c00 Thanks for your nice work!

@7c00
Copy link
Member Author

7c00 commented Mar 7, 2022

Rebased with master branch.

@7c00 7c00 force-pushed the presto-lark branch 2 times, most recently from 5585ba4 to 0f6088f Compare March 8, 2022 03:52
Copy link
Member

@beinan beinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, rerun the test

@7c00
Copy link
Member Author

7c00 commented Mar 9, 2022

Thanks @beinan ❤️ . This CI finished. Could we merge this PR?

@beinan beinan merged commit 8750895 into prestodb:master Mar 9, 2022
@7c00
Copy link
Member Author

7c00 commented Mar 10, 2022

Thanks @beinan @ChunxuTang for your time to review!

@7c00 7c00 deleted the presto-lark branch March 10, 2022 01:55
@varungajjala varungajjala mentioned this pull request Mar 22, 2022
9 tasks
@asjadsyed asjadsyed mentioned this pull request Mar 23, 2022
9 tasks
@asjadsyed asjadsyed mentioned this pull request Apr 1, 2022
8 tasks
@rohanpednekar
Copy link
Contributor

@7c00 Do you think you can provide short documentation on how to use this connector that we can publish at https://prestodb.io/docs/current/connector.html ?

@7c00
Copy link
Member Author

7c00 commented Apr 26, 2022

@rohanpednekar Yes. The connector doc is included in the PR. It's https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/connector/larksheets.rst. It's a bit strange that it's not displayed in prestodb website.

@rohanpednekar
Copy link
Contributor

Thanks @7c00 for the confirmation, I guess this ll get published with the 0.272 release notes PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants